Skip to content

Conversation

@GuyEshdat
Copy link
Contributor

@GuyEshdat GuyEshdat commented Aug 14, 2025

Summary by CodeRabbit

  • New Features
    • Added Dremio-compatible data type handling, covering string, numeric, timestamp, and boolean categories.
    • Improves behavior and consistency when working with Dremio-backed projects, aligning with other adapters.
    • Ensures stable handling of unsupported types without errors.

@linear
Copy link

linear bot commented Aug 14, 2025

ELE-4931 dremio cll

@github-actions
Copy link
Contributor

👋 @GuyEshdat
Thank you for raising your pull request.
Please make sure to add tests and document all user-facing changes.
You can do this by editing the docs files in the elementary repository.

@coderabbitai
Copy link

coderabbitai bot commented Aug 14, 2025

Walkthrough

Added a new macro dremio__data_type_list(data_type) that returns Dremio-compatible type lists for string, numeric, timestamp, and boolean categories, or an empty list otherwise. Minor whitespace adjustments made around macro boundaries. No other functional changes.

Changes

Cohort / File(s) Summary of Changes
Dremio data type list macro
macros/utils/data_types/data_type_list.sql
Introduced dremio__data_type_list(data_type) returning predefined type lists (string, numeric, timestamp, boolean). Returns empty list for unsupported inputs. Minor whitespace updates.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~7 minutes

Poem

I tapped my paw and typed with glee,
A list of types for Dremio’s tea.
Strings, nums, stamps, and boole so bright—
Now neatly grouped, all just right.
Hippity-hop, merge me through,
This bunny ships clean macros too! 🥕✨

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch ele-4931-dremio-types-mapping

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these settings in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 3a4b2ee and 5a310e0.

📒 Files selected for processing (1)
  • macros/utils/data_types/data_type_list.sql (2 hunks)
🔇 Additional comments (2)
macros/utils/data_types/data_type_list.sql (2)

132-132: Whitespace-only change; no action needed.
Non-functional whitespace change after trino__data_type_list. Safe to leave as-is.


154-154: Whitespace-only change; no action needed.
Non-functional whitespace change after clickhouse__data_type_list endmacro. Safe to leave as-is.

Comment on lines +155 to +173
{% macro dremio__data_type_list(data_type) %}
{% set string_list = ['VARCHAR', 'CHARACTER VARYING'] | list %}
{% set numeric_list = ['INT','INTEGER','BIGINT','DOUBLE','DECIMAL','FLOAT','SMALLINT','TINYINT'] | list %}
{% set timestamp_list = ['DATE','TIME','TIMESTAMP', 'TIME WITH TIME ZONE', 'TIMESTAMP WITH TIME ZONE'] | list %}
{% set boolean_list = ['BOOLEAN', 'BIT'] | list %}

{%- if data_type == 'string' %}
{{ return(string_list) }}
{%- elif data_type == 'numeric' %}
{{ return(numeric_list) }}
{%- elif data_type == 'timestamp' %}
{{ return(timestamp_list) }}
{%- elif data_type == "boolean" %}
{{ return(boolean_list) }}
{%- else %}
{{ return([]) }}
{%- endif %}

{% endmacro %}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Dremio type list added — verify aliases and TZ types; consider adding REAL/BOOL
Good addition and consistent with the dispatch pattern. A couple of points to tighten correctness vs Dremio’s reported data_type values:

  • Add aliases: REAL (numeric) and BOOL (boolean) are commonly seen; consider including them.
  • Verify whether BIT is actually returned by Dremio INFORMATION_SCHEMA vs upstream sources; it might be safer to include BOOL and keep BIT only if confirmed.
  • Verify support for TIME WITH TIME ZONE and TIMESTAMP WITH TIME ZONE in Dremio; if unsupported, they should be removed to avoid false-positive matches.

Proposed minimal improvement (adds REAL and BOOL; leave BIT and TZ types pending verification):

 {% macro dremio__data_type_list(data_type) %}
-    {% set string_list = ['VARCHAR', 'CHARACTER VARYING'] | list %}
-    {% set numeric_list = ['INT','INTEGER','BIGINT','DOUBLE','DECIMAL','FLOAT','SMALLINT','TINYINT'] | list %}
-    {% set timestamp_list = ['DATE','TIME','TIMESTAMP', 'TIME WITH TIME ZONE', 'TIMESTAMP WITH TIME ZONE'] | list %}
-    {% set boolean_list = ['BOOLEAN', 'BIT'] | list %}
+    {% set string_list = ['VARCHAR', 'CHARACTER VARYING'] | list %}
+    {% set numeric_list = ['INT','INTEGER','BIGINT','DOUBLE','DECIMAL','FLOAT','REAL','SMALLINT','TINYINT'] | list %}
+    {% set timestamp_list = ['DATE','TIME','TIMESTAMP', 'TIME WITH TIME ZONE', 'TIMESTAMP WITH TIME ZONE'] | list %}
+    {% set boolean_list = ['BOOLEAN', 'BOOL', 'BIT'] | list %}

Additionally, if Dremio reports fixed-length CHAR/CHARACTER in INFORMATION_SCHEMA for some sources, consider including:

  • string_list: add 'CHAR', 'CHARACTER'

Please confirm with Dremio docs and/or INFORMATION_SCHEMA outputs before finalizing removals.

To validate support and normalize to actual INFORMATION_SCHEMA values, please check Dremio docs for supported data types and the canonical strings reported in INFORMATION_SCHEMA:


🌐 Web query:

Dremio SQL data types and INFORMATION_SCHEMA data_type values. Specifically: does Dremio support SMALLINT, TINYINT, REAL, BOOL, BIT, TIME WITH TIME ZONE, and TIMESTAMP WITH TIME ZONE, and what are the exact canonical type names returned?

💡 Result:

Dremio does not natively support canonical SQL types SMALLINT, TINYINT, TIME WITH TIME ZONE, or TIMESTAMP WITH TIME ZONE; it maps them to other types. The canonical type names returned in INFORMATION_SCHEMA and Dremio metadata for supported types are as follows:

Support for Specific Types & Canonical Type Names

Supported by Dremio

  • BIT: Mapped to BOOLEAN[7][1].
  • BOOL: Returned as BOOLEAN[7][1].
  • REAL: Returned as FLOAT (single-precision float)[1][3][4][7].
  • SMALLINT: Mapped to INTEGER[1][6][7][9].
  • TINYINT: Mapped to INTEGER[1][6][9].
  • TIME: Returned as TIME (without time zone)[1][6][7].
  • TIMESTAMP: Returned as TIMESTAMP (without time zone)[1][6][7].
  • TIMESTAMP WITH TIME ZONE: Mapped to TIMESTAMP or TIMESTAMP (without TZ information—Dremio ignores/strips the time zone part)[6][7].

Not Supported as Canonical Types

  • TIME WITH TIME ZONE: Mapped to TIME (no TZ)[6][7].
  • TIMESTAMP WITH TIME ZONE: Mapped to TIMESTAMP (no TZ)[6][7].
  • SMALLINT/TINYINT: Returned as INTEGER (not as "SMALLINT" or "TINYINT")[1][6][7][9].

Exact Canonical Type Names

The values Dremio reports in metadata (including INFORMATION_SCHEMA.COLUMNS) are the canonical type names listed below (case-sensitive, all uppercase):

  • INTEGER
  • BIGINT
  • FLOAT
  • DOUBLE
  • BOOLEAN
  • DECIMAL
  • VARBINARY
  • VARCHAR
  • DATE
  • TIME
  • TIMESTAMP
  • INTERVAL (day to seconds)
  • INTERVAL (years to months)

These are the types you will observe when querying for column data types in Dremio's metadata tables and via INFORMATION_SCHEMA[1][3][4][6][7].

Mappings from External Sources

  • SQL Server: SMALLINT, TINYINT, BITINTEGER; BOOL/BITBOOLEAN; REALFLOAT; no support for TIME WITH TIME ZONE or true TIMESTAMP WITH TIME ZONE (mapped to TIME/TIMESTAMP)[1].
  • PostgreSQL: BOOL/BITBOOLEAN; SMALLINT/TINYINTINTEGER; REALFLOAT; TIMESTAMPTZ/TIMETZTIMESTAMP/TIME (zone info stripped)[7].
  • Teradata: SMALLINT/TINYINT/BYTEINTINTEGER; TIME WITH TIMEZONETIME; TIMESTAMP WITH TIMEZONETIMESTAMP[6].

Summary Table

SQL Type Canonical Dremio Type Supported/Mapping Notes
SMALLINT INTEGER Mapped
TINYINT INTEGER Mapped
REAL FLOAT Canonical name is FLOAT
BIT BOOLEAN Mapped
BOOL BOOLEAN Canonical name is BOOLEAN
TIME WITH TIME ZONE TIME Zone info not stored
TIMESTAMP WITH TIME ZONE TIMESTAMP Zone info not stored

If you query INFORMATION_SCHEMA.COLUMNS, these canonical types (INTEGER, FLOAT, BOOLEAN, TIME, TIMESTAMP, etc.) will be returned—not the original SQL type names from the source system[1][3][6][7][9].

If you need to preserve the distinction between, for example, SMALLINT/TINYINT and INTEGER or retain time zone information, Dremio does not natively support this at the type level. You'll need to manage such semantics in your data model or ETL process outside Dremio.

Citations:


Update Dremio data_type_list to use only canonical INFORMATION_SCHEMA types

Dremio only returns the uppercase canonical type names—it does not emit SMALLINT, TINYINT, REAL, BOOL, BIT, or TZ-qualified types in INFORMATION_SCHEMA.COLUMNS. Tighten the macro to match exactly what Dremio reports:

• File: macros/utils/data_types/data_type_list.sql
• Lines: 155–173

 {% macro dremio__data_type_list(data_type) %}
-    {% set string_list = ['VARCHAR', 'CHARACTER VARYING'] | list %}
-    {% set numeric_list = ['INT','INTEGER','BIGINT','DOUBLE','DECIMAL','FLOAT','SMALLINT','TINYINT'] | list %}
-    {% set timestamp_list = ['DATE','TIME','TIMESTAMP', 'TIME WITH TIME ZONE', 'TIMESTAMP WITH TIME ZONE'] | list %}
-    {% set boolean_list = ['BOOLEAN', 'BIT'] | list %}
+    {% set string_list    = ['VARCHAR']                                         | list %}
+    {% set numeric_list   = ['INTEGER','BIGINT','FLOAT','DOUBLE','DECIMAL']       | list %}
+    {% set timestamp_list = ['DATE','TIME','TIMESTAMP']                           | list %}
+    {% set boolean_list   = ['BOOLEAN']                                          | list %}
 
     {%- if data_type == 'string' %}
         {{ return(string_list) }}
@@ unchanged @@
 {% endmacro %}

This aligns the lists with what Dremio’s INFORMATION_SCHEMA actually returns.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
{% macro dremio__data_type_list(data_type) %}
{% set string_list = ['VARCHAR', 'CHARACTER VARYING'] | list %}
{% set numeric_list = ['INT','INTEGER','BIGINT','DOUBLE','DECIMAL','FLOAT','SMALLINT','TINYINT'] | list %}
{% set timestamp_list = ['DATE','TIME','TIMESTAMP', 'TIME WITH TIME ZONE', 'TIMESTAMP WITH TIME ZONE'] | list %}
{% set boolean_list = ['BOOLEAN', 'BIT'] | list %}
{%- if data_type == 'string' %}
{{ return(string_list) }}
{%- elif data_type == 'numeric' %}
{{ return(numeric_list) }}
{%- elif data_type == 'timestamp' %}
{{ return(timestamp_list) }}
{%- elif data_type == "boolean" %}
{{ return(boolean_list) }}
{%- else %}
{{ return([]) }}
{%- endif %}
{% endmacro %}
{% macro dremio__data_type_list(data_type) %}
{% set string_list = ['VARCHAR'] | list %}
{% set numeric_list = ['INTEGER','BIGINT','FLOAT','DOUBLE','DECIMAL'] | list %}
{% set timestamp_list = ['DATE','TIME','TIMESTAMP'] | list %}
{% set boolean_list = ['BOOLEAN'] | list %}
{%- if data_type == 'string' %}
{{ return(string_list) }}
{%- elif data_type == 'numeric' %}
{{ return(numeric_list) }}
{%- elif data_type == 'timestamp' %}
{{ return(timestamp_list) }}
{%- elif data_type == "boolean" %}
{{ return(boolean_list) }}
{%- else %}
{{ return([]) }}
{%- endif %}
{% endmacro %}

@GuyEshdat GuyEshdat temporarily deployed to elementary_test_env August 17, 2025 11:13 — with GitHub Actions Inactive
@GuyEshdat GuyEshdat had a problem deploying to elementary_test_env August 17, 2025 11:13 — with GitHub Actions Failure
@GuyEshdat GuyEshdat temporarily deployed to elementary_test_env August 17, 2025 11:13 — with GitHub Actions Inactive
@GuyEshdat GuyEshdat had a problem deploying to elementary_test_env August 17, 2025 11:13 — with GitHub Actions Failure
@GuyEshdat GuyEshdat temporarily deployed to elementary_test_env August 17, 2025 11:13 — with GitHub Actions Inactive
@GuyEshdat GuyEshdat temporarily deployed to elementary_test_env August 17, 2025 11:13 — with GitHub Actions Inactive
@GuyEshdat GuyEshdat temporarily deployed to elementary_test_env August 17, 2025 11:13 — with GitHub Actions Inactive
@GuyEshdat GuyEshdat temporarily deployed to elementary_test_env August 17, 2025 11:13 — with GitHub Actions Inactive
@GuyEshdat GuyEshdat temporarily deployed to elementary_test_env August 17, 2025 11:13 — with GitHub Actions Inactive
@GuyEshdat GuyEshdat temporarily deployed to elementary_test_env August 17, 2025 11:13 — with GitHub Actions Inactive
@GuyEshdat GuyEshdat temporarily deployed to elementary_test_env August 17, 2025 11:13 — with GitHub Actions Inactive
@GuyEshdat GuyEshdat temporarily deployed to elementary_test_env August 17, 2025 11:13 — with GitHub Actions Inactive
@GuyEshdat GuyEshdat temporarily deployed to elementary_test_env August 17, 2025 11:13 — with GitHub Actions Inactive
@GuyEshdat GuyEshdat temporarily deployed to elementary_test_env August 17, 2025 11:13 — with GitHub Actions Inactive
@GuyEshdat GuyEshdat temporarily deployed to elementary_test_env August 17, 2025 11:13 — with GitHub Actions Inactive
@GuyEshdat GuyEshdat temporarily deployed to elementary_test_env August 17, 2025 11:13 — with GitHub Actions Inactive
@GuyEshdat GuyEshdat temporarily deployed to elementary_test_env August 17, 2025 11:13 — with GitHub Actions Inactive
@GuyEshdat GuyEshdat temporarily deployed to elementary_test_env August 17, 2025 11:13 — with GitHub Actions Inactive
@GuyEshdat GuyEshdat temporarily deployed to elementary_test_env August 17, 2025 11:13 — with GitHub Actions Inactive
@GuyEshdat GuyEshdat merged commit 178e57c into master Aug 17, 2025
21 of 34 checks passed
@GuyEshdat GuyEshdat deleted the ele-4931-dremio-types-mapping branch August 17, 2025 11:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants